Data acquisition for improving machine learning models
نویسندگان
چکیده
The vast advances in Machine Learning over the last ten years have been powered by availability of suitably prepared data for training purposes. future ML-enabled enterprise hinges on data. As such, there is already a vibrant market offering annotation services to tailor sophisticated ML models. In this paper, we present research practical problem obtaining order improve accuracy We consider an environment which consumers query enhance their models and providers who possess make them available first formalize interaction process laying out suitable framework associated parameters exchange. then propose two acquisition strategies that trade-off between exploration during obtain learn about distribution provider's exploitation optimize our inquiries utilizing gained knowledge. strategy, Estimation Allocation, utilize queries estimate utilities various predicates while learning data; proceed allocation stage those learned utility estimates inform decisions. second algorithmic proposal, named Sequential Predicate Selection, utilizes sampling strategy explore data, adaptively investing more resources parts space are statistically promising overall model accuracy. detailed experimental evaluation proposals variety real sets exploring all applicable interest. identify trade-offs highlight relative benefits each algorithm further
منابع مشابه
Machine Learning Models for Housing Prices Forecasting using Registration Data
This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...
متن کاملImproving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features
Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...
متن کاملSelective Data Acquisition for Machine Learning
In many applications, one must invest effort or money to acquire the data and other information required for machine learning and data mining. Careful selection of the information to acquire can substantially improve generalization performance per unit cost. The costly information scenario that has received the most research attention (see Chapter X) has come to be called ”active learning,” and...
متن کاملa new approach to credibility premium for zero-inflated poisson models for panel data
هدف اصلی از این تحقیق به دست آوردن و مقایسه حق بیمه باورمندی در مدل های شمارشی گزارش نشده برای داده های طولی می باشد. در این تحقیق حق بیمه های پبش گویی بر اساس توابع ضرر مربع خطا و نمایی محاسبه شده و با هم مقایسه می شود. تمایل به گرفتن پاداش و جایزه یکی از دلایل مهم برای گزارش ندادن تصادفات می باشد و افراد برای استفاده از تخفیف اغلب از گزارش تصادفات با هزینه پائین خودداری می کنند، در این تحقیق ...
15 صفحه اولIntegrating Multiple Learned Models for Improving and Scaling Machine Learning
This paper presents the Plannett system, which combines artiicial neural networks to achieve expert-level accuracy on the diicult scientiic task of recognizing volcanos in radar images of the surface of the planet Venus. Plannett uses ANNs that vary along two dimensions: the set of input features used to train and the number of hidden units. The ANNs are combined simply by averaging their outpu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2021
ISSN: ['2150-8097']
DOI: https://doi.org/10.14778/3467861.3467872